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ABSTRACT 

“Does placing workers together based on their personality 
give better performance results in cooperative crowdsourc¬ 
ing settings, compared to non-personality based crowd team 
formation?” In this work we examine the impact of per¬ 
sonality compatibility on the effectiveness of crowdsourced 
team work. Using a personality-based group dynamics ap¬ 
proach, we examine two main types of personality combina¬ 
tions (matching and crashing) on two main types of tasks (col¬ 
laborative and competitive).Our experimental results show 
that personality compatibility significantly affects the qual¬ 
ity of the team’s final outcome, the quality of interactions and 
the emotions experienced by the team members. The present 
study is the first to examine the effect of personality over team 
result in crowdsourcing settings, and it has practical implica¬ 
tions for the better design of crowdsourced team work. 
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INTRODUCTION 

Efficient team collaboration is a decisive factor for the suc¬ 
cess of any group project. Team formation, i.e. the selection 
of which individuals will become part of the team is one of 
the most critical steps in this process. Among the factors that 
play a role in a successful team formation are the individual 
team members’ personalities. Indeed, as literature indicates, 
teams with matching personalities cooperate more efficiently 
compared to those teams where the participants’ personalities 
do not match, or even crash [2]. This knowledge, if properly 
exploited and applied at large-scale, could be valuable for en¬ 
hancing team output in crowd work settings. 

Cooperative crowdsourcing is a relatively new form of crowd¬ 
sourcing, in which crowd workers interact to accomplish 
tasks either collaboratively or competitively, in contrast to 


typical crowdsourcing applications that comprise indepen¬ 
dent worker effort. A cooperating crowd can nonethe¬ 
less accomplish more complex, interconnected tasks, due 
to the combination of various skills and knowledge back¬ 
grounds, with example applications including ideation con¬ 
tests, knowledge synthesis, collaborative problem solving and 
citizen science, to mention just a few [16, 26]. Yet, similarly 
to micro-task-based crowdsourcing, cooperative crowdsourc¬ 
ing also faces quality concerns and although certain works 
try to improve crowd team efficiency, most often through ex¬ 
amining the proper incentives to give [21], very few works 
to-date exploit group dynamics and none exploits personality 
compatibility among the crowd team members. 

Research on group dynamics and individual personality is 
vast in the fields of social psychology and personality psy¬ 
chology respectively. The formation of groups by matching 
the individual members’ personalities is a field that has been 
studied less, mostly due to its multi-factorial nature (i.e. per¬ 
sonality factors, situational factors, interaction factors) [17]. 
Nevertheless, certain approaches and assessment tools can be 
found in this direction (see related literature) and this psycho¬ 
logical knowledge can be exploited to assist group formation 
in cooperative crowdsourcing. These approaches need to be 
carefully examined prior to any application on crowd envi¬ 
ronments, due to the differences between crowd teams and 
the teams typically examined in social psychology. Indeed, 
whereas the typical team settings examined by group dynam¬ 
ics studies are mostly face-to-face, the people in cooperative 
crowdsourcing need to work from a distance and mostly asyn¬ 
chronously (due to the different time zones, availabilities and 
work schedules). Therefore, the idea of bringing together 
crowd workers based on their personalities is something that 
needs to be tested, and this is exactly what this work is about. 

In this paper we examine the impact of personality compat¬ 
ibility on the effectiveness and final output quality of crowd 
teamwork. Based on the DISC personality test [31] and the 



interactionist approach (both borrowed by group dynamics 
studies) we examine two main types of team personality com¬ 
patibility, crashing and matching, on two cooperative task 
types, collaborative and competitive, applied on advertise¬ 
ment development. Our study has practical implications for 
the design of cooperative crowdsourcing and it can be used 
by task designers as a relatively simple (requiring an initial 
personality test) way of ensuring high-quality group results. 

RELATED WORK 

Crowdsourcing is a successful paradigm, with high commer¬ 
cial, educational and academic potential. Most commercial 
crowdsourcing applications are based on micro-tasks, which 
are given to independent workers and do not require coopera¬ 
tion [44, 3]. Examples of this kind of crowd work include text 
translation, sentiment analysis, audio transcription and image 
recognition. Recent research explores using crowdsourcing 
for more complex tasks (e.g. [29]), which are often interde¬ 
pendent, of subjective nature and based on worker coopera¬ 
tion [26]. Examples of such tasks include news article writ¬ 
ing, product design or collaborative software development. 

The main concern that often hinders trust in crowdsourcing, 
either micro-task or cooperative-based, is the final outcome’s 
low quality. A line of works explores the use of automated 
means to improve quality without exceeding the available 
task’s budget. Indicatively, Karger et al. [23] use plural¬ 
ity optimization mechanisms for finding the optimal num¬ 
ber of workers to allocate per micro-task, in order to en¬ 
sure high quality while minimizing task cost. Other works 
apply preprocessing to filter out low-quality workers, based 
on reputation mechanisms, screening mechanisms [10], pre¬ 
qualification tests, or golden data [22]. Post-processing is 
also applied to refine and evaluate task quality after the tasks 
are completed [46], or while they are being processed [38]. 

Another line of studies point out that enhancing crowd work 
quality needs a change of viewpoint; from considering work¬ 
ers as homogeneous, interchangeable units (typical crowd¬ 
sourcing model) to taking into account the human factor, 
i.e.the emotional and cognitive personal characteristics of the 
workers. Motivation is the factor most extensively examined 
and many works have found significant correlations between 
various incentives and task output quality especially as far as 
creative or innovative tasks are concerned (indicatively [21]). 
Morris et al. [33] use priming to increase the performance 
output of workers in creative crowdsourcing tasks. Their re¬ 
sults confirm that this technique helps improve worker per¬ 
formance. Sampath et al. [39] use cognitive-inspired features 
in task design as a technique to improve the quality of the 
crowdwork. 

Eew works also explore the use of personality in crowdsourc¬ 
ing. Indicatively, Kazai et al. [24, 25] examine the quality 
of the workers’ output in relation to their personality traits. 
Their results confirm a strong correlation between worker 
personality traits and their work-related traits (tasks com¬ 
pleted, task completion time and accuracy). Other works in 
this direction also use personality aspects to predict differ¬ 
ences between worker stereotypes (competent/incompetent, 
meticulous/sloppy etc.)[l 1, 45]. The above works are in line 


with the present study regarding the importance of taking 
into account the personality of crowd workers for their bet¬ 
ter selection and allocation to the tasks. However, most cur¬ 
rent studies focus on individual workers, while the present 
work focuses on the use of personality for the composition of 
worker teams, therefore targeting not only worker-to-task but 
also worker-to-worker matching. 

Erom a psychology perspective, literature either focuses on 
the individual or on the group. Regarding the individual, the 
individual characteristics both cognitive and biological can 
be already effectively measured (e.g. conservatism [47, 4, 
12] sensation-seeking [49] etc.). Also, valid theories and 
tests that categorize people’s personality traits as individuals, 
like Holland’s 6 personality types [20], Costa and McCrae’s 
[8] NEO-PI-R five factor analysis, Cattell’s 16 factors [6], 
Myers-Briggs Type Indicator [34], or Eysenck’s supertraits 
[13] study people as individuals and not as part of groups. 

Regarding groups, there is extensive literature in personality 
and social psychology about groups and member’s behavior. 
Less research however exists on which specific personalities 
one can bring to a group in order to increase efficiency and 
how a specific individual with certain personality traits will 
behave once in a group. Relevant to this, the person-situation 
debate in psychology (whether a person’s personality or the 
situation is the main determinant of her behavior and perfor¬ 
mance) can be summarized by three main theories [18]: Trait 
theories support that personality is the main factor; Situa¬ 
tional theories support that the situation is the main behav¬ 
ior factor; and Interaction theories support that behavior is 
a synthesis of the two [5]. In this work, we decided to fol¬ 
low the interactionist approach as it includes more factors, 
it is supported by long-term research data (e.g. see the 15- 
year long review by [37]) and its usefulness has been already 
demonstrated in HCI [35]. Consequently, we rely on vari¬ 
ants of both situational elements (e.g. nature of the task) and 
individual personality traits in order to extract the variables 
affecting group performance that will be used in this paper. 

Regarding personality factors, there are few theories and 
tools that study the individual as a part of the group. The 
DISC personality test [31] identifies four main types of 
group members: 2 leader types with high Dominance (task- 
oriented, focus on task completion) or high Inducement 
(socio-emotionally oriented, focus on interpersonal rela¬ 
tions) and 2 non-leader types with high Submission (socio- 
emotionally oriented) or high Compliance (task-oriented). 
According to Belbin’s approach [2] effective teams include 
people of 8 different types (Chairman, Shaper, Plant, Mon¬ 
itor evaluator. Resource investigator, Teamworker, Company 
worker. Completer) and a team is successful if all of the above 
roles are covered. 

Regarding situational factors, the literature emphasizes the 
importance of the task’s nature, i.e. that a person’s work 
behavior is highly related to the task that the person is in¬ 
volved in. After reviewing the Steiner [41] task typology and 
other available literature [30], we identified 7 main task types. 
Task type I denotes whether a task can be divided to fur¬ 
ther subtasks (values: 1. Divisible-existence of subtasks, 2. 



Unitary-no subtasks). Task type II denotes whether the team 
focuses on the quality or the quantity of the task (values: 1. 
Maximizing-importance placed on quantity, 2. Optimizing- 
importance placed on quality). Task type III denotes the 
mechanism used by the team to combine the contributions of 
its individual members (values: 1. Additive-individual inputs 
are added, 2. Compensatory-group product is the average 
of individual judgments, 3. Disjunctive-product is selected 
from pool of individual judgments, 4. Conjunctive-product 
is a synthesis of all member contributions, 5. Discretionary- 
group can decide how individual inputs relate to group prod¬ 
uct). Task type IV denotes the way of cooperation among the 
team members (values: 1. Collaborative-commonality of in¬ 
terests, 2. Competitive-conflict of interests, 3. mixed motive- 
both common and conflicting interests). Task type V denotes 
the level of difficulty of the task (values: 1. Easy, 2. Difficult). 
Task type VI denotes the task’s duration (values: 1. Short, 2. 
Long). Task type VII denotes the subjectivity of the task (val¬ 
ues: 1. Intellective-a correct answer exists, 2. Judgmental-no 
demonstrably correct answer). 

In addition to the 7 task types, another very important fac¬ 
tor of group productivity is the number of group members. 
Due to the Ringelmann effect [28], there is an inverse rela¬ 
tionship between the number of people in the group and the 
individual performance. Possible values are: 2-7, 8-9 or 9-16 
members, since as the literature shows, significant qualita¬ 
tive differences are observed in the behavior of groups below, 
around and over 8 people [42]. Also, the amount of control 
given to the group leader in a given situation seems to play a 
crucial role. According to contingency theories, the leader’s 
type and the situational control she might have, affect the 
group’s outcome [14]. Last, groups seem to interact accord¬ 
ing to four main group interaction types [18]: 1. Interacting 
(natural processes occurring during face-to-face interactions), 
2. Brainstorming (synchronous technique that encourages all 
ideas while withholding any criticism), 3. Nominal (both syn¬ 
chronous and asynchronous technique: members first work 
independently, then meet to discuss their ideas) and 4. Delphi 
(asynchronous technique, similar to nominal groups: mem¬ 
bers never meet, instead they first work independently, then 
they see other member ideas and work again alone). 

METHODOLOGY 
Research Design 

Lrom the above, we identify 11 main personality and sit¬ 
uational elements that affect group performance (Table 1). 
Given their potential values, these elements give rise to a 
significant number of possible experimental combinations 
(> 23000). In this section, we describe the decisions taken, in 
regards to the values of these elements, which lead us to our 
specific research design, and suited the context of personality- 
based matching in cooperative crowdsourcing. 

The first decision pertains to the personality assessment tool 
that will allow the extraction of the individual personalities of 
the crowd workers (Individual Personalities element). Prom 
all the available theories and tests, the ones that have direct 
relation to team work and not simply individual assessments 
were chosen. In particular, the DISC test focuses on the way 


that different group members will interact with each other 
and the roles that they will play inside the group. In addi¬ 
tion, the DISC test differentiates between 4 main types (with 
fluctuations in the proportion of the different dimensions), 
whereas Belbin (the other candidate test) identifies 8 main 
types present in ideal teams. Thus, it was decided to use the 
DISC test in the current study mainly for practicality reasons, 
since it leads to smaller and easier to handle worker groups. 
In a future work however, the Belbin test will be also used. 
Based on the DISC test, two types of groups were formed, in 
regards to personality compatibility: 

• Groups of matching personalities. They consisted of one 
Dominant personality, one Inducement personality, one or 
two Submission personalities and one or two Compliance 
personalities. This group type included all the DISC types 
while avoiding the presence of two similar types of leaders. 

• Groups of crashing personalities. They consisted of either 
more than one leaders of similar type (usually D types). 

Lollowing the interactionist approach, apart from personal¬ 
ity elements, we also incorporated situational elements, i.e. 
specific task types (elements: task type TVIT). Lor the con¬ 
text of this research, it was decided to examine group per¬ 
formance under competition as well as collaboration. Thus, 
each of the basic groups was further divided into two more 
categories(fask type IV: Collaborative/Competitive, task type 
III: Conjunctive/Disjunctive): 

• Collaborative, where workers co-create a concept. 

• Competitive, where workers compete for the best concept. 

The remaining task type elements were kept stable across all 
worker groups. Specifically, since our particular research de¬ 
sign aimed at crowdsourcing contexts, the task that the work¬ 
ers would accomplish needed to be of short duration to in¬ 
crease chances of task completion by the participants (task 
type VI: Short) and fully computerized, so that people would 
be able to perform it without leaving their PC, from a distance 
and without time zone constraints. Also, since we aimed at a 
broad crowdsourcing worker pool the task should not require 
prior expertise (task type V: Easy). Since we needed to exam¬ 
ine the influence of personality, the task should not be routine 
or repetitive but rather creative, to allow the expression of 
the workers’ personality. Being a creative task, there is no 
correct or incorrect answer (task type VII: Judgmental). A 
judgmental task would further allow diverse group processes 
to emerge. In order not to interrupt the group creativity pro¬ 
cesses, the task was also chosen to be unitary (no subtasks) 
(task type I: Unitary). Since we were interested in the quality 
of the final group outcome, the task’s objective should focus 
on quality rather than on quantity (task type II: Optimizing). 

The present study operated with groups of 5 people (# Team 
members: 5), due to the assessment tool used (minimum 4 
members) and in line with research findings showing that an 
effective group should not exceed 8 members [42]. 

In regards to leadership we did not impose any type of lead¬ 
ership and each group was allowed to perform as its members 
wish (Leader control: Low). Despite the fact leaders were 
identified from the initial personality test, group members 



were allowed to interact freely with each other and without 
knowing who the leader is, in order to see the actual group 
dynamics and not the ones we had predicted before the inter¬ 
action. This choice was precisely meant to allow us observe 
whether leadership would emerge and under which personal¬ 
ity combinations. 

The chosen group interaction type was Delphi, since it is a 
type of nominal group approach and nominal groups seem to 
provide better results than other types (interaction and brain¬ 
storming) [43]. The asynchronous nature of Delphi would 
also allow the interaction of participants from different time 
zones and work schedules, in a sequential rather than simul¬ 
taneous manner, which is found to be better for uncertain, 
subjective tasks, like the ones used in this research [1]. The 
research design presented above, resulted in the following 4 
experimental conditions: 

1. CR/CM: Crashing Competitive. A group with crashing 
personalities, working on the task competitively. 

2. CR/CL: Crashing Collahorative. A group with crashing 
personalities, working on the task collaboratively. 

3. M/CM: Matching Competitive. A group with matching 
personalities, working on the task competitively. 

4. M/CL: Matching Collahorative. A group with matching 
personalities, working on the task collaboratively. 

Research Hypotheses 

Given our basic question: “Does team formation based on 
personality matching give better performance results in coop¬ 
erative crowdsourcing settings, compared to non-personality 
based matching?”, our two main research null hypotheses are: 

1. Hoi- Quality affinal outcome. The quality of the final 
outcome of the group work will not have significant differ¬ 
ences among the 4 experimental conditions. Especially the 
matching personality conditions are not expected to out¬ 
perform the crashing personality conditions. 

2. Ho 2 - Group effectiveness and emotions. The quality of 
the perceived group effectiveness and emotions will not 
have significant differences among the 4 experimental con¬ 
ditions. Especially the participants of the matching person¬ 
ality conditions are not expected to work more efficiently 
and experience less negative emotions (motivation, satis¬ 
faction, frustration, confidence, etc.), compared to the par¬ 
ticipants of the crashing personality conditions. 

The above represent the two fundamental, generic hypotheses 
that this research dealt with. Additional sub-hypotheses have 
been identified and dealt with, which are not discussed in this 
section for reasons of space and readability. Part of these is 
presented in the results section, and the rest as part of the fu¬ 
ture work in the discussion section. The identified hypotheses 
will be analyzed qualitatively and quantitatively. 

Experiment Implementation 

Task Description 

According to the requirements identified in our research de¬ 
sign, we decided to use the task of cooperative advertise¬ 
ment creation. Specifically, as also shown by Dow et al. 


Variable 

Value 

Individual Per¬ 

DISC-based (4 types) 

sonalities 

Belbin-based (8 types) 

# team members 

2-7 


8-9 


9-16 

Task type I 

Divisible 

Unitary 

Task type II 

Maximizing 

Optimizing 

Task type III 

Additive 

Compensatory 

Disjunctive 

Conjunctive 

Discretionary 

Task type IV 

Collaborative 

Competitive 

Mixed motive 

Task type V 

Easy 

Difficult 

Task type VI 

Short 

Long 

Task type VII 

Intellectual 

Judgmental 

Group interaction 

Interaction 


Brainstorming 

Nominal 


Delphi 

Leader control 

Low 


High 

Other environ¬ 

Known variables to affect group 

mental variables 

work (i.e. affecting productivity, 
cohesiveness, creativity, etc.) will 
remain stable during the different 
experiments. The same guidelines 
for group work will be followed 
through the study. 


Table 1. Variables affecting group performance (values used in present 
research in bold) 


[9] an advertisement task fulfills certain of our key criteria 
like: short duration, no expert and previous knowledge re¬ 
quirement, ability to express creativity and ability for both 
objective and subjective measurements of quality. Accord¬ 
ing to this task, groups with matching or crashing personal¬ 
ity combinations would be asked to create the advertisement 
campaign of a new product, either competitively or collabora¬ 
tively. Generally speaking, a product’s campaign can consist 
of many elements, like slogan, scenario, music, logo, etc. It 
can also vary depending on the broadcasting medium (tele¬ 
vision, radio, Internet etc.). To keep the task short, here 
we choose to ask workers for the product’s slogan (text up 
to 50 words) and scenario (text up to 150 words) aimed for 
TV broadcasting. The product to advertise was a new Ac¬ 
tive coffee beverage, called “sCOPA”. Coffee was used, af¬ 
ter reflecting among various candidate products, because it 
is a product likely to be known to people across the globe, 
with rather neutral belief connotations (e.g. religious, politi- 




cal, etc.), and without being exclusively associated with any 
particular brand (as it would be the case e.g. for specific soft 
drink products). The task was implemented in two versions: 

• Competitive task version. The group is asked to create 
the final advertisement by selecting one single campaign, 
among the ones proposed by its individual members. 

• Collaborative task version. The group is asked to create 
the final advertisement by combining the campaign ideas 
proposed by its individual members. The group members 
are free to take ideas one from the other, and change the 
original texts. 

Crowdsourcing workflow 

We used the crowdsourcing platform CrowdFlower.com 
mainly for its breadth of worker sample (access to 5M work¬ 
ers from 154 countries in over 50 labor channels). Ethics ap¬ 
proval was obtained and all legal requirements for data pro¬ 
tection were fully followed. Participants were informed about 
the academic nature of the experiments and their legal rights. 
Following this, the implementation of the experimental de¬ 
sign was conducted in 3 rounds. 

Round 1. DISC personality test 

The 1 St round was an open crowdsourcing task, where work¬ 
ers were invited to take the DISC personality test. This task 
paid 1$. 295 workers from 59 different countries participated 
in this round. Each worker was asked if she would like to par¬ 
ticipate in the next rounds (subject to selection based on her 
profile) and, in case of a positive answer, to provide us with a 
contact email. 

Round 2. Individual advertisements 

In the 2nd round the workers who stated interest to participate 
were invited to make an individual advertisement (slogan and 
scenario as described above) about the sCOPA coffee prod¬ 
uct, through a dedicated CrowdFlower job that paid 1$. They 
were instructed that their “ads should be original with a clear 
market value, using simple, understandable and honest mes¬ 
sages and emphasizing on the unique aspects of the product”. 
These instructions were meant to align worker contributions 
with the final outcome quality axes that we intended to mea¬ 
sure at the end of the experiment (see Evaluation Metrics De¬ 
sign sub-section). 185 workers participated in this round. 

Round 3. Cooperative advertisement creation 
The 3rd and most important round of the experiment con¬ 
sisted of selecting the workers and placing them into the 
groups. Four distinct types of worker groups were created, 
according to our 4 experimental conditions. Selected workers 
were invited by email. Each group comprised 5 workers, who 
were given a link to a Google document, on which they would 
work to create the final sCOPA advertisement. This docu¬ 
ment contained 3 parts: 1) Task instructions, 2) the 5 individ¬ 
ual advertisements created by the individual team members 
in Round 2, and 3) document space to host the final group ad¬ 
vertisement. The competitive groups were instructed to read 
the individual advertisements, discuss and select the best one, 
without any changes in the slogan or scenario. The collabo¬ 
rative groups were instructed to read the individual worker 
advertisements, discuss and create one new advertisement 


by merging, modifying and taking ideas from any individ¬ 
ual advertisement they wanted. Workers of all groups were 
instructed to actively discuss and interact with the other peo¬ 
ple in their groups, for the final group outcome. The interac¬ 
tion was asynchronous, through threads of comments that the 
workers would add to the Google document. Each group had 
a working period of 5 days, to keep the task short. One day 
before the deadline each worker group was sent a reminder, 
inviting people to participate if they had not done so. To mo¬ 
tivate participation, workers were paid based on their level of 
interaction with their groups (0.5-2$), while an extra bonus 
was given to those groups that managed to make the final ad¬ 
vertisement (1$). 145 people participated to the 3rd round, 
split into 29 groups. 

Evaluation metrics design 

Following our two hypotheses, we evaluated the: i) Final 
group outcome and the ii) Group effectiveness and emotions. 
This was a multidimensional evaluation process, where both 
quantitative and qualitative metrics were used. 

Final group outcome evaluation 

According to Hoffman [19] the successful ad: is creative, dra¬ 
matizes and communicates the reasons to buy the product, is 
honest, is simple (one message is better than two), rhymes 
things, is possible, and looks for the product’s Unique Sell¬ 
ing Point. Based on this study, as well as on similar recent 
research developments on information and content quality [7, 
27], we defined five axes of final group outcome quality: 1) 
Originality (How original and creative is the advertisement?), 
2) Market Value (How likely is it that the advertisement will 
attract customers?), 3) Simplicity (How simple and under¬ 
standable is the message of the advertisement?), 4) Honesty 
(How honest is the advertisement?) and 5) Unique Selling 
Point (How well does the advertisement highlight the differ¬ 
ences between this product and other similar products?). 

Although other dimensions could also be evaluated ([27, 7]), 
it was decided to keep the evaluation process simple and 
short, to facilitate the evaluators. The resulting questionnaire 
was given to an expert evaluator (advertisement industry pro¬ 
fessional) as well as to the 1250 crowd workers (50 work¬ 
ers per final advertisement), to also get the average user’s 
opinion and capture the “Wisdom of Crowds” effect (crowds 
can outperform the estimations of individual experts [15]). 
Each worker assessed up to 5 advertisements to avoid work¬ 
ing memory cognitive overload [32]. 

Group interaction effectiveness and emotions 
Following the 2nd hypothesis, participants after the 3rd round 
were given a questionnaire developed based on the emotions 
classification study by Pekrun’s and colleagues’ [36] and their 
Achievement Emotions Questionnaire. It assessed the fol¬ 
lowing: 7) Motivation (How motivated did the worker feel 
to participate to the group advertisement creation task), 2) 
Stress (How stressed or frustrated the worker felt during her 
interaction with the group), 3) End result (How satisfied the 
worker was with her group’s end result), 4) Communication 
quality (How happy she was with the quality of communica¬ 
tion among the group members), 5) Sharing confidence (How 
confident the worker felt to share her opinion with the group). 



6) Acceptance (How well did the group welcome the worker’s 
contribution), 7) Opinion on cooperative tasks and 8) Interest 
for re-invitation to similar tasks in the future. The question¬ 
naire also included open-ended questions over the workers’: 
9) Face-to-face behavior (How the worker’s behavior would 
be different in case the task was face-to-face), 10) Process 
suggestions (What would the worker change in the overall 
process) and 11) Other comments. 

RESULTS 

Overall sample statistics 

Overall, in a population of 295 workers that took the person¬ 
ality test of the 1st round, we observe the following: 

• Leader types: D: 42.71%, I: 9.15%, D/I: 4.41% 

• Non-leader types: S: 13.56%, C: 13.90%, S/C: 4.07% 

• Mixed (all other combinations): 12.20% 

As it can be observed, the crowd worker population is not nor¬ 
mally distributed, but there is a higher percentage of Leader 
types (56%) versus non-leader types (31,46%). Thus the 
probability of having a randomly selected team with more 
than one leader is high. This observation strengthens the 
significance of our results, since in case our hypotheses are 
verified, this would mean than a team formation which does 
not take into account personality compatibility risks a sub- 
optimal result. Also, this made the selection process more 
challenging since we needed to balance matching and crash¬ 
ing group populations. From the 185 people who participated 
in Round 2, 145 were invited to the 3rd round (in order to 
have a balanced number of teams). In the end, team forma¬ 
tion was as follows: 29 groups, 5 workers each, of which 6 
were CR/CM, 7 CR/CL, 7 M/CL and 7 M/CM. 

Observations on Group behavior 

To answer the question “did the groups behave as ex¬ 
pected?", a qualitative analysis of comment logs was per¬ 
formed. Looking deeper into the group processes, a few very 
interesting patterns were revealed. Most groups did behave 
as expected. In fact, most tension among group members 
was built in the Competitive Crashing groups. Ironic com¬ 
ments were observed as people shouting (using capital let¬ 
ters): “Dear [participant number removed] Thank you for 
your comment. I was waiting for such a remark. However, 
this could be the reason why somebody will NEVER FOR¬ 
GET this advertisement and for this reason REMEMBER TO 
BUY sCOPA COFFEE THE NEXT TIME HE VISITS A SU¬ 
PERMARKET OR A COFFEE SHOP". Apart from the sharp 
comments, the competitive crashing groups were also spend¬ 
ing a lot of time discussing the processes to follow, without 
easily reaching an end result. In one case, the group did not 
reach a final decision at all. Participant comments were re¬ 
vealing: “Apart from that, it seems that we are unable to 
agree on one advert being a perfect winner” ... “Fm begin¬ 
ning to feel like this is really a psychology experiment, and 
they want to see what we will do about people not participat¬ 
ing” ... “Oh well, that didn’t go as planned, did it? I think 
there will be a LOT for the academics to draw out from this 
experience!”. 


On the other side, we observed the efficiency of the match¬ 
ing groups and especially the Matching Collaborative groups, 
which not only seemed to easily reach a group decision, but 
also created a positive and encouraging atmosphere: “To 
point it out again: Good job, team!” ... “I hope everything 
is okay with what we ’ve done, great job everyone and good 
luck!” ... “It was great working with you all” ... “Yes, good 
job team! Hope to work with you in the future.. =)”. The 
participant comments are presented exactly as written by the 
participants, or with clearly indicated grammar, spelling or 
other corrections inside brackets. 

Further interesting observations can be drawn regarding 
leader behavior. First, in the absence of leaders it was dif¬ 
ficult for the group to reach a decision and in fact in 2 out of 
3 cases, the group did not reach a decision at all. Second, in 
those Crashing conditions where the D leaders did not have an 
active participation the groups tended to convert to Matching 
and work without tension. Occasionally, in the cases when 
the D leaders did not dynamically participate, other members 
came in charge. These people were either type I leaders (if 
this personality type was actively participating) or even non¬ 
leader type. Third, in cases of Crashing groups where only 
up to 3 people participated, even if they were all D leaders, 
they seemed to communicate efficiently. In these cases, it 
seems that the group size is crucial, meaning that strong lead¬ 
ers can cooperate as long as they are not more than 3 people 
in a group. Third when the leaders (D and I) participated, the 
group functioned well. In their absence, however, the group 
crashed, until the leaders took over again and the group re¬ 
gained control. Finally, in a group with a strong D leader 
and other mixed types, the mixed types adopted a non-leader 
approach and let the strong leader lead the group. 


Observations on Worker behavior 

Our next set of observations seeks to answer the question: 
“Did the group members behave as expected?”. Analyzing 
individual behavior in a group is a very challenging task, 
since human behavior does not follow predetermined paths. 
However, we tried to observe leader and non-leader behav¬ 
ior in the different groups and it seems that the majority of 
individuals behaved more or less as expected. Leader types 
seemed to lead the groups and non-leader types seemed to fol¬ 
low. We also observed differences between socio-emotional 
leaders (personality type I) and task leaders (personality type 
D). There were only 3 cases of groups with members of un¬ 
expected behavior, meaning that the people categorized as 
leader behaved as non-leaders and vice versa. 

The D personalities were dominating the group processes in 
most groups. Indicatively, D leaders with clear task orien¬ 
tation, determined the decision-making processes: “Iput al¬ 
ready all the slogan below, in the decision page [it] will [be] 
easier for us” ... “Let’s vote here. Reply with your vote 
(don’t forget own ID). Last one to vote, do us a favour by 
copy-pasting the winner’s (ID, Slogan, Scenario)” ... “Hello 
friends, I leave this comment to remind those who have not yet 
participated that have until September 7 to give their [vote, 
so] that all participate.”. 



Socio-emotional leaders (type I) were focusing on the group 
interactions, encouraging other members: “I will not com¬ 
ment mine, but I had a funny time doing it. congrats to all” 
... “That’s a great scenario [participant ID removed]. I like 
that it emphasises the concept that the coffee can be drunk 
either hot or cold. I took the liberty of adding a slogan to the 
scenario! Feel free to change it if you disagree!" ... “well 
i like the first idea since I was the one who wrote it, to be 
fair its not that good and it can use some adjustments, tell me 
what you think, and of all the ideas here i think No 5 is fairly 
good”. 

Hypothesis 1 Evaluation (Quality of final outcome) 

Expert evaluation. 

The groups’ hnal advertisements were evaluated by an ex¬ 
pert advertisement professional for their quality in regards 
to originality, market value, understandability, honesty and 
unique selling point (evaluation dimensions as explained 
above, Hoffman, 2012). The overall score of each advertise¬ 
ment (measured in a scale [0-50], i.e. the sum of scores of the 
5 individual axes) was calculated and the score of the 4 ex¬ 
perimental conditions were compared using a Kruskal-Wallis 
analysis. The expert’s ratings reveal a superiority of the 
Matching Collaborative groups’ advertisements (mean score 
24.71 out of 50). The worst end results came from the Crash¬ 
ing Competitive groups (mean score 14 out of 50), followed 
by Crashing Collaborative groups (mean 14.28) and Match¬ 
ing Competitive groups (mean 17.43). The same pattern is 
also observed for each of the 5 individual quality axes. The 
overall evaluation rating of the expert is depicted in Figure 1. 

Crowd evaluation. 

The crowd {N = 1250) agreed with the expert regarding the 
higher quality outcome of the matching groups (especially 
in the collaborative task), compared to the crashing ones. 
This is statistically conhrmed, with one-way ANOVA anal¬ 
yses for all quality axes (indicatively for the overall quality 
rating axis: F(3,1397) = 28.05, p < .001 and similar re¬ 
sults with highly signihcant p values < .001 for the individual 
axes). We note nevertheless that the crowd consistently pro¬ 
vided higher marks than the expert. Figure 1, illustrates the 
average crowd ratings, next to the respective expert ratings. 
From the above, null hypothesis iFoi is rejected. 

Hypothesis 2 Evaluation (Group efficiency and emotions) 

Participant questionnaire answers were analyzed regarding 
hypothesis 2. Three statistically signihcant results were 
found, and their averages are depicted in Figure 3. 

Communication Quality. 

The participants of the Matching Collaborative groups re¬ 
ported the highest levels of satisfaction, followed by the 
Matching Competitive, Crashing Collaborative and Crash¬ 
ing Competitive groups. This result is statistically signihcant 
with H = 8.57, 3, p < .05 and fully in line with hypothesis 
2. This quantitative result was further validated by qualita¬ 
tive analysis of participant comments. Indicatively, compar¬ 
ing the comments of a participant from a Crashing Collabo¬ 
rative group ( “Unfortunately, i didn ’t have an active group in 
which a discussion could be properly held. Either they were 
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Figure 1. Hypothesis 1 Final Outcome Evaluation. 


a bit inactive, or their communication skills were a bit rusty. 
I tried leading the group, since they were all pretty pleasing 
with each others. Not many comments or edits were done. 
They would mostly throw their original idea, and that was 
about it.”), with the comment of a participant in a Matching 
Collaborative group ( “great task! feels like i was working 
with team! i ll be very happy if i could do more tasks like 
this in future”), we observe an obvious difference in opinion. 
This was also rehected in the word clouds (created using the 
Semantria software') to visualize the content of intra-group 
communication. Figure 2 provides one indicative word cloud 
example per group category. Similar patterns were found for 
most groups. 

Stress Levels. 

Individuals in the matching conditions and especially in the 
Matching Collaborative groups felt more relaxed and reported 
signihcantly lower stress levels than the crashing groups and 
especially the Crashing Competitive ones (H = 7.87, 3, 
p < .05). This result shows that in regards to stress lev¬ 
els, the personality compatibility (crashing or matching) does 
play a role especially when the task is of competitive na¬ 
ture. The same pattern is revealed through the qualitative 
analysis. Indicatively, although participants from the match¬ 
ing conditions do not report any stress issues and they are 
rather pleased with the overall experience, a participant from 
a Competitive Crashing group said: “.. .1 was a little nerves 
[nervous] in case I was “intruding” on regulars, but hope¬ 
fully next time I’ll have more confidence.” 

End Result. 

People in the matching groups and especially collaborative 
matching ones were more pleased (H = 8.23, 3, p < .05) 
with the group’s hnal result than the people in the crashing 
conditions. This hnding is also rehected in the qualitative 
data. However, it is interesting to see the in-between views 
of participants of the Competitive Matching conditions. Al¬ 
though they liked the other group members and enjoyed their 


* Semantria, Lexalytics. https://semantria.com/ 





































interaction, the competitive nature of the task, left these par¬ 
ticipants with mixed feelings. Indicatively, a worker men¬ 
tioned: “Team result is good but I’m [a] little disappoint[ed] 
because my hard work did not succeed. Overall I am happy 
that finally we have a deserving winner.” 

Finally, no statistically significant differences were found in 
regards to motivation, sharing confidence and acceptance. All 
workers reported that they were highly motivated to partici¬ 
pate (mean=2.89, SD=0.18), confident to share their opinion 
(mean=2.81, SD=0.24) and felt relatively accepted by their 
group (mean=2.58, SD=0.41). All participants, with no sta¬ 
tistical significance across the groups, expressed satisfaction 
with cooperative crowdsourcing tasks (mean=2.84, SD=0.21) 
and interest to be re-invited to similar tasks in the future. Fi¬ 
nally, participants reported motivated to participate: “I en¬ 
joyed this task a lot, specially the third round. Although there 
were differences in opinions and moments when the member 
couldn ’t even agree to disagree, it was a fun and motivat¬ 
ing experience.” They were also pleased with the fact that 
they could share their ideas with others: “it was a fun way of 
making and sharing opinions with others with the benefits of 
doing something important”. 

Overall, most participants were happy with a cooperative 
crowdsourcing task: “a very creative way to make people 
working together” ... “I really liked this job and looking for¬ 
ward to participate further in such jobs!” ... “I’m very happy 
that I had the chance to participate in this test. I believe that 
it would be very interesting to see more taskjs] like this in the 
future, tasks that make workers think and express themselves 
on different ideas. Thanks so much,” ... “Thanks very much 
for this task, it was the one I have enjoyed the most since I 
started doing tasks. Being able to be creative while having a 
chance to work with others was a really, really great experi¬ 
ence.” 

Although many people suggested that a synchronous com¬ 
munication would be beneficial for this kind of task and many 
believed that they would be more active in a face-to-face task, 
however, a few participants raised concerns: “.. .1 would be 
less comfortable in a face to face situation ”... “I would be 
more quiet. I’m more confident when I’m writing. I was the 
first to comment on the Google document, but if this was a 
face to face task I believe I would listen to other people’s 
opinions before speaking ”... “face2face would be easier 
(less pressure on language writing skills), but it would be less 
convincing, face2face required specific times and that can be 
a big problem” ... “if it was face-to-face task i would be more 
emotional because there wouldn ’t be time to calm down, if i 
don’t like something”. Thus, it seems that a synchronous or a 
face-to-face interaction would be better for some and not all 
participants. Flowever, it is definitely worth exploring further 
in a future study. 

Since quite a few participants believed that this was a real ad¬ 
vertisement job they were suggesting improvements regard¬ 
ing the efficiency of the advertisement development. For 
example, a participant said: “Probably best if you removed 
part 3 and you guys at the sCOPa mkt department just pick 
an end result.” However, in general the vast majority of the 


participants was very happy with the job, thanking the re¬ 
search team. There are numerous comments on that direc¬ 
tion: Indicatively: “Great task! one of my favoritejs] so 
far” ... “I would love to do this job again in the future! 
Thanks! ”... “Looking forward to working on more collab¬ 
orative projects”. From the above, null hypothesis Ho 2 is 
partially rejected, in regards to the Communication Quality, 
Stress levels and End Result axes. 
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Figure 3. Hypothesis 2 - Partieipant emotions during gronp interaetion 
and self-perceptions of group efficiency. 


DISCUSSION, LIMITATIONS AND FUTURE WORK 

Our analysis showed that the crowdsourcing population 
seems to have significantly higher percentages of leader (D, I) 
than non-leader types (S, C). Thus the probability of creating 
a crashing team if selecting randomly is high. This reinforces 
the significance of our research, enabling a more effective se¬ 
lection of workers through the creation of more compatible 
teams and thus the achievement of a higher-quality result. 

Although all participants were pleased with the cooperative 
nature of the tasks, people in the matching conditions and 
especially the collaborative ones reported better group com¬ 
munication, lower stress levels and liked the end result more 
(statistical significance of hypothesis 2). The importance of 
having happy and relaxed workers is sufficiently studied in or¬ 
ganizational psychology [48], and it has also been indicated 
in crowdsourcing settings [21]. Among the different intrinsic 
motivators known to affect crowdsourcing quality output (e.g. 
reputation, satisfaction with the task etc.), this research adds 
that personality matching in groups can be another powerful 
intrinsic motivator for work. 

Groups and individuals mostly behaved as expected while in¬ 
teracting with the group, implying that the DISC tool has a 
good prognostic value. DISC also highly correlates with an¬ 
other well-known, valid and reliable tool, MBTI, gaining fur¬ 
ther convergence validity [40]. 

The statistical signihcance of hypothesis 1 brings along prac¬ 
tical benehts for crowdsourcing task designers and crow¬ 
sourcing platforms. Specifically, through a relatively easy 
approach (a personality test and group matching) worker pro¬ 
ductivity in group tasks can be signihcantly increased. The 
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healthier happy favorites 
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Figure 2. Word clouds of team discussions. From left to right: Crashing Competitive, Crashing Collahorative, Matching Competitive and Matching 
Collaborative. Word/concept frequency indicated hy word size. Positive sentiments in green, negative sentiments in red, neutral in hlack. 


same outcome can be potentially beneficial for other applica¬ 
tions, where group tasks among previously unknown individ¬ 
uals can take place, such as learning applications or corporate 
settings. 

Concerning ethics, we observed that it was very easy for peo¬ 
ple to reveal their personality traits (almost 300 responses 
were collected in only 2 hours). While this study strictly fol¬ 
lowed all ethical research guidelines, this is not guaranteed 
in commercial practices. Future work could also examine the 
reasons why people are ready to give their personal data and 
how personality can be used correctly and with integrity in 
crowdsourcing applications. 

Finally our results are valid only for the specific task types 
that were studied (collaborative/competitive, creative, of 
short duration, relatively easy, with low leader control etc.). 
Other task types could be affected in different ways, or even 
not at all by personality matching within the group. For exam¬ 
ple, routine tasks (as opposed to creative) or tasks with high 
leader control, could be affected less, in the first case because 
personality does not need to be expressed and in the second, 
because the team members’ roles are clearly predefined. Fu¬ 
ture work could examine the proposed approach under the 
scope of different task types, varying the values of the differ¬ 
ent elements presented in Table 1 of our research design. 

CONCLUSION 

In this work we examined the impact of personality com¬ 
patibility on the effectiveness of group work in cooperative 
crowdsourcing. Our results, on two main types of personality 
combinations (matching or crashing) and on two main types 
of tasks (collaborative and competitive), show that indeed the 
way people are placed together can significantly affect the fi¬ 
nal outcome of the team, as well as the emotions and satisfac¬ 
tion of the individual team members. Specifically we showed 
that teams with matching personalities perform better and are 
more satisfied than teams with crashing personalities. This 
is especially true for matching collaborative groups, although 
statistically significant differences were found among all four 
group combinations. Our results are even more important 
keeping in mind that in crowdsourcing, the probability of 
coming up with a crashing team is high, due to the high per¬ 
centage of leader personalities observed in the crowd worker 
population. This work is the first to examine the effect of 
personality over team result in crowdsourcing settings. Its re¬ 
sults have practical implications for crowdsourcing platforms 
and task designers, who want to leverage crowdsourced team 


work and improve its outcomes. We hope that the present re¬ 
search will be a first step in a new field, one that will examine 
personality aspects in crowdsourced group activities, and that 
more researchers will be inspired to continue this effort. 
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